A computational auditory scene analysis system for speech segregation and robust speech recognition

نویسندگان

  • Yang Shao
  • Soundararajan Srinivasan
  • Zhaozhang Jin
  • DeLiang Wang
چکیده

A conventional automatic speech recognizer does not perform well in the presence of multiple sound sources, while human listeners are able to segregate and recognize a signal of interest through auditory scene analysis. We present a computational auditory scene analysis system for separating and recognizing target speech in the presence of competing speech or noise. We estimate, in two stages, the ideal binary time-frequency (T-F) mask which retains the mixture in a local T-F unit if and only if the target is stronger than the interference within the unit. In the first stage, we use harmonicity to segregate the voiced portions of individual sources in each time frame based on multipitch tracking. Additionally, unvoiced portions are segmented based on an onset/offset analysis. In the second stage, speaker characteristics are used to group the T-F units across time frames. The resulting masks are used in an uncertainty decoding framework for automatic speech recognition. We evaluate our system on a speech separation challenge and show that our system yields substantial improvement over the baseline performance. Index Terms – speech segregation, computational auditory scene analysis, binary time-frequency mask, robust speech recognition, uncertainty decoding

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

مشکلات جداسازی اصوات گفتاری همزمان در کودکان کم شنوا

Objective: This study was a basic investigation of the ability of concurrent speech segregation in hearing impaired children. Concurrent segregation is one of the fundamental components of auditory scene analysis and plays an important role in speech perception. In the present study, we compared auditory late responses or ALRs between hearing impaired and normal children. Materials & Methods...

متن کامل

Monaural segregation of voiced speech using discriminative random fields

Techniques for separating speech from background noise and other sources of interference have important applications for robust speech recognition and speech enhancement. Many traditional computational auditory scene analysis (CASA) based approaches decompose the input mixture into a time-frequency (T-F) representation, and attempt to identify the T-F units where the target energy dominates tha...

متن کامل

Challenge Problem for Computational Auditory Scene Analysis: Understanding Three Simultaneous Speeches

Understanding three simultaneous speeches is proposed as a challenge problem to foster arti cial intelligence, speech and sound understanding or recognition, and computational auditory scene analysis research. Automatic speech recognition under noisy environments is attacked by speech enhancement techniques such as noise reduction and speaker adaptation. However, the signal-to-noise ratio of sp...

متن کامل

An Auditory Scene Analysis Approach to Monaural Speech Segregation

A human listener has the remarkable ability to segregate an acoustic mixture and attend to a target sound. This perceptual process is called auditory scene analysis (ASA). Moreover, the listener can accomplish much of auditory scene analysis with only one ear. Research in ASA has inspired many studies in computational auditory scene analysis (CASA) for sound segregation. In this chapter we intr...

متن کامل

Martin Cooke , Phil Green and Malcolm Crawford HANDLING MISSING DATA IN SPEECH RECOGNITION

In this paper, we propose a new paradigm for robust ASR based on auditory scene analysis. In previous work, we have shown how models of auditory processing and grouping principles can be used to separate the evidence for a speech signal from arbitrary intrusions. However, this evidence will generally be incomplete since some spectrotemporal regions will be dominated by the other sources. Here, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computer Speech & Language

دوره 24  شماره 

صفحات  -

تاریخ انتشار 2010